AITopics | visitation distribution

IOSTOM: Offline Imitation Learning from Observations Via State Transition Occupancy Matching

Neural Information Processing SystemsJun-17-2026, 09:26:33 GMT

Offline Learning from Observation (LfO) focuses on enabling agents to imitate expert behavior using datasets that contain only expert state trajectories and separate transition data with suboptimal actions. This setting is both practical and critical in real-world scenarios where direct environment interaction or access to expert action labels is costly, risky, or infeasible. Most existing LfO methods attempt to solve this problem through state or state-action occupancy matching. They typically rely on pretraining a discriminator to differentiate between expert and non-expert states, which could introduce errors and instability--especially when the discriminator is poorly trained. While recent discriminator-free methods have emerged, they generally require substantially more data, limiting their practicality in low-data regimes.

artificial intelligence, dataset, machine learning, (14 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Transportation > Marine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

08d562c1eedd30b15b51e35d8486d14c-Paper.pdf

Neural Information Processing SystemsMay-1-2026, 01:40:46 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Adversarial Intrinsic Motivation for Reinforcement Learning

Neural Information Processing SystemsApr-25-2026, 17:32:29 GMT

Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks. Specifically, this paper focuses on goal-conditioned reinforcement learning where the idealized (unachievable) target distribution has full measure at the goal. This paper introduces a quasimetric specific to Markov Decision Processes (MDPs) and uses this quasimetric to estimate the above Wasserstein-1 distance. It further shows that the policy that minimizes this Wasserstein-1 distance is the policy that reaches the goal in as few steps as possible. Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function. Our experiments show that this reward function changes smoothly with respect to transitions in the MDP and directs the agent's exploration to find the goal efficiently. Additionally, we combine AIM with Hindsight Experience Replay (HER) and show that the resulting algorithm accelerates learning significantly on several simulated robotics tasks when compared to other rewards that encourage exploration or accelerate learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Texas > Travis County > Austin (0.15)

Industry: Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

1796a48fa1968edd5c5d10d42c7b1813-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 21:48:17 GMT

demonstration, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

1796a48fa1968edd5c5d10d42c7b1813-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 21:48:13 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.75)

Add feedback

AdversarialIntrinsicMotivationforReinforcement Learning

Neural Information Processing SystemsFeb-8-2026, 11:56:06 GMT

In thispaper,weinvestigatewhether onesuchobjective,theWasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectivelyforreinforcement learning (RL)tasks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Industry: Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

VisualAdversarialImitationLearning usingVariationalModels

Neural Information Processing SystemsFeb-7-2026, 16:06:10 GMT

Behaviour cloning (BC) is a classic algorithm to imitate expert demonstrations [7], which uses supervised learning to greedily match the expert behaviour at demonstrated expert states. Due to environmentstochasticity,covariateshift,andpolicyapproximationerror,theagentmaydriftaway from the expert state distribution and ultimately fail to mimic the demonstrator [8].

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

1796a48fa1968edd5c5d10d42c7b1813-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 16:06:07 GMT

algorithm, demonstration, imitation, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.75)

Add feedback

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

Neural Information Processing SystemsNov-20-2025, 20:37:37 GMT

In this paper, we propose a new off-policy estimator that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance faced by existing methods.

machine learning, reinforcement learning, trajectory, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Supplementary Materials A Experiment As suggested by one reviewer, we conduct the following experiment over Cartpole in OpenAI gym to

Neural Information Processing SystemsOct-2-2025, 14:02:27 GMT

The following lemma justifies item 3 in Assumption 1. Consider the following two cases: 1. Density function of the policy is smooth, i.e. We then show how Theorem 4 implies Theorem 1. Assumption 3. F or all x X, there exist constants such that the following hold 1. F or all x, we have null A Now we proceed to prove the main theorem. Then, given the above convergence result on the gradient norm, we proceed to prove the convergence of NAC in terms of the function value.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

Filters

Collaborating Authors

visitation distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

IOSTOM: Offline Imitation Learning from Observations Via State Transition Occupancy Matching

08d562c1eedd30b15b51e35d8486d14c-Paper.pdf

Adversarial Intrinsic Motivation for Reinforcement Learning

1796a48fa1968edd5c5d10d42c7b1813-Supplemental.pdf

1796a48fa1968edd5c5d10d42c7b1813-Paper.pdf

AdversarialIntrinsicMotivationforReinforcement Learning

VisualAdversarialImitationLearning usingVariationalModels

1796a48fa1968edd5c5d10d42c7b1813-Paper.pdf

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

Supplementary Materials A Experiment As suggested by one reviewer, we conduct the following experiment over Cartpole in OpenAI gym to